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Abstract 



I show how to avoid a two level nested conjugate gradient procedure in the context 
of Hybrid Monte Carlo with the overlap fermionic action. The resulting procedure is 
quite similar to Hybrid Monte Carlo with domain wall fermions, but is more flexible and 
therefore has some potential worth exploring. 
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By now it is clear that strictly massless QCD can be put on the lattice without 
employing fine tuning [1]. At the moment, all practical ways to do this are theoretically 
based on the overlap [2,3]. The procedures are practical only because the fermionic matrix 
D admits a simple expression in terms of a function e of a very sparse matrix Hw- D, e 
and Hw will be defined below. 

One may wonder why we need to settle for a function £ of a sparse matrix, and not 
use a matrix D which is sparse by itself. The main point is that e is not analytic in its 
argument while Hw is analytic in the link gauge fields, given by unitary matrices U^x), 
where p = 1, 2, 3, 4 denotes a direction and x a lattice site. However, for strictly massless 
fermions, D cannot be analytic in the link variables. Indeed, if it were, we could not have 
eigenvalues of D depending nonanalytically on the link variables: On the lattice a set of 
link variables all set to unity can be smoothly deformed to a good approximation to an 
instanton. In the process of this deformation the eigenvalue of D closest to zero will move 
until some intermediate set of links variables is reached and after that will be stuck at 
zero. More complicated evolutions can also occur, but they all have to be nonanalytic in 
the link variables. Moreover, the lack of analyticity comes from the global structure of the 
gauge background described by the link variables. A sparse and local D cannot provide 
such an eigenvalue movement. In the overlap, the burden of introducing the nonanalyticity 
is carried by the function e. The dynamics can then be relegated to a sparse matrix Hw 
which is local and analytic in the link variables. That only a function of a sparse matrix 
enters makes it still possible to use polynomial Krylov space methods and avoid full storage 
of the fermion matrix. It is well known that full storage is prohibitive at reasonable system 
sizes. 

The explicit form of the massless fermionic matrix D is 

D= 1 -(l + e'e), (1) 

where e' and e are hermitian and square to unity. Thus, V = e'e is unitary. Replacing 1 
by a parameter p slightly larger (smaller) than unity corresponds to giving the fermions a 
positive (negative) mass.* Switching the sign of the physical mass corresponds to replacing 
p by - [3]. D acts on a space of dimension AVn c where V is the number of lattice sites 
(x), 4 is the number of spinorial indices (ot, (3) and the fermions are in the fundamental 
representation of SU(n c ), also carrying a group index i. Usually, one suppresses the 
spinorial and group indices, but indicates the site indices explicitly. One uses the indices 
p, v both for directions and for vectors of length one in the respective direction. 

* The pure gauge field action is assumed to have zero theta-parameter. 
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Clearly, dete = (— l)^ tre and the same is true of e'. The latter is picked so that 
tre' = for all gauge fields. The simplest choice is e' = 75, in which case all the gauge field 
dependence comes in through e. \tre is the topological charge of the background [1,2]. e 
is defined by 

e = 7=f= > H w = 75-Dm/ = H w , 

(D w iP)(x) = V(x)/k - - -y^U^x^x + //) + (! + 7m)^(^ " /^(a - //)] 



where D\y is the Wilson lattice Dirac operator with hopping k in the range (.125, .25). 
The matrices 7^ are Euclidean four by four Dirac matrices acting on spinor indices. 

e is defined for all gauge orbits with H w > 0. It is nonanalytic when Hw has a zero 
eigenvalue. This exclusion of the "zero measure" set of gauge fields where Hw has exact 
zero modes is necessary [1,2], as explained above, in order to cut the space of lattice gauge 
orbits up into different topological sectors. The space of allowed gauge backgrounds has 
also to provide a base manifold capable of supporting the nontrivial U(l) bundles needed 
to reproduce chiral anomalies [4]. 

Saying that the space of forbidden gauge orbits has zero measure is not really sufficient 
to discount other possible consequences of the nonanalyticity of D: for example, one 
may be worried that nonlocal effects are somehow introduced, and the regularized theory 
isn't going to become massless QCD in the continuum limit. That there are no bad 
side consequences of the nonanalyticity is obvious from the following observation: Gauge 
fields with relatively small local curvature (in other words, with all parallel transporters 
round elementary plaquettes close to unitary matrices - note that this is a gauge invariant 
requirement) will produce an H w bounded away from zero. Indeed, the spectrum of Hw is 
gauge invariant and has a gap around zero on the trivial orbit. Thus, the above is evident 
by continuity. (More formal arguments have recently appeared in [5].) The continuum limit 
is dominated by gauge configurations which are far from the excluded backgrounds where 
H w is non-invertible. That this had to be quite obvious follows from the fact that D w , 
by itself, would describe massive fermions on the lattice, and that this mass, controlled by 
the variable k, is kept of order inverse lattice spacing a when a is taken to zero. 

In theory we need e = s(Hw), where e(x) is the sign function giving the sign of x, and 
is nonanalytic at x = 0. However, since Hw will never have exactly zero eigenvalues, a 
numerical implementation of e(x) seems possible. Of course, what determines the level of 
difficulty is how close the numerical implementation of e{x) has to be to the true e{x) for all 
values of x that are possible. The set of values of x we need to consider is the set of values 
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the eigenvalues of H\y can take. In practice, H\y can have eigenvalues close to zero, and, 
since in that vicinity the true e(x) has a jump, the numerical implementation inevitably 
becomes expensive (in cycles) for gauge fields that produce an H\y with numerically tiny 
eigenvalues. Of course, the overall scale of Hw is irrelevant for the sign function, so it is the 
ratio between the largest and smallest eigenvalues (in absolute value) that matters. This is 
the condition number of Hw, and it plays a significant role in all numerical considerations 
that follow. 

The method that seems most promising is to compute the action of e on a vector 
by approximating the sign function s(x) by a ratio of polynomials 

e(x) ^ e n (x) = ^ (3) 

where the deg(Q) = deg(P) + l = 2n. A simple choice, which obeys in addition < 1, 
is [6]: 

e ^- (l + *) 2w -(l-*) 2w -*y I U) 

(1 + *)*» + (1 - n^ iC o^[{s-\)^W + W[{s-\)^] 

This choice also respects the symmetry e(x) = e(^). Thus, it treats the extremities of the 
spectrum of Hw equally. There is a certain advantage in knowing that the approximation 
e n (x) never exceeds unity in absolute magnitude. This ensure that the related approximate 
matrix D never has strictly zero eigenvalues, a source of concern when D itself is inverted, 
something we need to also do. 

It was pointed out in [7] that abandoning |e n (^)| < 1, the quantity max x6 ( a ^ \e(x) — 
e n (x)\ (a < Hw < b) can be minimized for fixed n and the needed n for a given accuracy 
can be reduced relative to (4). It is advantageous to work with a small n. It is not known 
at the moment whether the tradeoff between a smaller n and l^nO^)! < 1 is beneficial. In 
this context let me observe that one can always use (4) with n = 1 on any other sign- 
function approximation of Hw- This doubles the effective n, but reintroduces |£ n (^)| < 1- 
It does not reintroduce the inversion symmetry under x — > 1/x, but the latter may be less 
important in practice. 

Whichever polynomials one uses, the main point is that a fractional decomposition as 
in (4) makes it possible to evaluate the action of e at the rough cost of a single conjugate 
gradient inversion of H w [6]. The parameter n only affects storage requirements, and even 
this can be avoided at the expense of an increase by a factor of order unity in computation 
[8]. Thus, keeping n as small as possible is not necessarily a requirement. On the other 
hand, for the approximation to the sign function to be valid down to small arguments, one 
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shall need to pick relatively large n's and face the slow-down stemming from the single 
conjugate gradient inversions being controlled essentially by the condition number of 
itself, with no help from the n-dependent shift. 

In both types of rational approximants [6,7] there exists a polynomial q of rank n such 

that 

Q(x) = \q(x)\ 2 . (5) 

This equation simply reflects the positivity of Q on the real line; it makes it possible to 
work with Hermitian matrices below. 

In quenched simulations one needs quantities of the form [3] : 

The inversion of D, or e' + e, needs yet another conjugate gradient iteration, and one ends 
up with a two level nested conjugate gradient algorithm. The operator that needs to be 
inverted e' + e, is hermitian and this is a potentially useful property numerically. This 
operator, as seen in the above equation, anticommutes with e' — e, and since the latter is 
generically non-degenerate, has a spectrum which is symmetric about zero. 

The operators H± = \ [e' ± e] have some nice properties so some comments about how 
they relate to D are in order [7,9,11]: Since VH±V = V*H±V* = H±, [H±, V + W] = 
{H±,V- V^} = 0. Note that V + = {e,e'} and V - V+ = [e',e] = ±4H T H±. We 
also know that K = Ker([e,e'}) = Ker(H + ) © Ker(H_) = Ker(l + V) © Ker(l - V). 
On the complement of K, K±, eigenvalues of H+, h, satisfy < \h\ < 1 and come in 
pairs h ± = ±\h\ = ±cos^ with < a < ir. The eigenvalues of H_ are ±sin^ on K± 
in the same subspaces. Corresponding to each eigenvectors/eigenvalues pair of H± are a 
pair of eigenvectors/eigenvalues of V (and V^) with eigenvalues e ±ia . The two pairs of 
eigenvectors are linearly related. States relevant to the continuum limit have a xs ir. These 
features generalize to the massive case, where one has to deal with H ab and H ba , where 
the matrix pencils are defined as H ab = ae + be' with real a, b. 

To see directly why the H± are special we follow [12] and represent them using two 
distinct bases, one associated with e and the other associated with e'\ eipi = e^i, e'tpl = 
Then < ip' i: H±ipj >= e ' 2 €j < ip^ipj >, showing that detH± factorizes [13] since 
matrix elements corresponding to e\ ± e j = vanish. Exactly half of the e- are 1 and the 
rest are — 1. When e is approximated, |e$| will no longer be precisely unity and we get 
some right-left mixing. 

Getting back to our main topic, we have ended up with a nested conjugate gradient 
procedure. This is not prohibitive in the quenched case [9] , but makes the entire approach 
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only tenuously feasible with present computational resources when dynamical simulations 
using Hybrid Monte Carlo are contemplated [10]. 

My objective here is to show that in the context of Hybrid Monte Carlo, a nested 
conjugate gradient procedure can possibly be avoided. Of course, this comes at some cost 
and only future work can tell how well the idea works. At this stage I only wish to draw 
attention to an alternative to using a nested conjugate gradient procedure in simulations 
with dynamical fermions. 

As usual with Hybrid Monte Carlo, we work with an even number of flavors. Obvi- 
ously, 

det D = det(e'D) = det « det ^[75 + s n (H w )}. (7) 



But, 



det - [75 + e n (H w )] = — . (8) 



For example, with the polynomials of (4) we have q(x) = (1 + x) n + i(l — x) n . 

The denominator det[Q] in (8) can be implemented by pseudofermions - by this term 
I mean variables in the functional integrals that carry the same set of indices as fermions 
do, only they are bosonic, so integration over the exponent of a quadratic form in pseud- 
ofermions is restricted to positive kernels, and produces the inverse of the kernel's deter- 
minant. Note that Q > and one does not need an even power here to ensure positivity. 

One does need an even power nevertheless, because it is a requirement embedded in the 
Hybrid Monte Carlo algorithm: In that algorithm, one needs to invert the fermion matrix, 
M, in the course of computing the Hybrid Monte Carlo force. M = q(Hw)^5q' (H\y) + 
P(Hw) is not positive definite, but should be - and this is achieved by doubling the number 
of fermions. In equation (8) I chose to factor the expression in such a way that M come 
out hermitian. I did this because numerical procedures are easier understood theoretically 
when the matrices are hermitian, and also, because this may help to reduce the condition 
number. But, there is no guarantee that it is really beneficial to make M hermitian. 
Therefore let me mention that other factorizations are possible: For example, using the 
approximation in (4), 

det-[ 75 + e n (^)] = det[(1 + gw) 2n + (1 _ gw) 2n ] ' ^ 

The matrix in the denominator is still positive definite, but M is not hermitian now, and 
resembles expressions obtained in the context of another truncation of the overlap [3], 
known as domain wall fermions [14] 
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The appearance of pseudofermions renders this case even closer to so called domain 
wall fermions. The trade-off is between an extra dimension there and the higher degree 
polynomials here. In the present approach there is more flexibility and one does not keep 
unneeded degrees of freedom in memory; still, it would be premature to decide which 
approach is best. Optimizing [7] to make n as low as possible seems now again worthwhile, 
more so that in the quenched case. 

The cost of an M ■ 4> operation is roughly An times the cost of an H w ■ <j> operation. 
The condition number of M may also be larger than that of H w and increase with n. It 
is therefore important to find out what the smallest n one can live with is. It could be 
that it turned out to be too hard to maintain e n (x) a good approximation to e(x) while 
keeping the condition number of M manageable. If one focuses only on the quality of the 
approximation to the sign function it is actually likely that the condition number of M 
will be large* because of the high degrees of the polynomials: Consider ip, a normalized 
eigenstate of Hw with eigenvalue h. We find ip^Mip = P(h) +Q(h)ip^ r y5ip. Both P(h) and 
Q(h) can be very big numbers (for large degree n). In absolute magnitude they are very 
close, this is why the ratio P(h)/Q(h) is close to ±1. But, this cancelation can be easily 
spoiled by the ip^^ip factor, and thus M can have very large eigenvalues. There is little 
reason to hope for M to have no small eigenvalues, so it might be the case that M has 
unacceptable large condition numbers when n is too large. 

I now wish to show that one can try to avoid this latter problem by introducing extra 
fields. This is, I believe, the essential reason why domain wall fermions [3,14] are at all 
practical. 

To understand this additional trick we start from some relatively easily proven iden- 
tities. Consider a fermionic bilinear action So: 

St) = ^75^ + - 4>lAlij + + . . . + (/) n - 1 A n (/) n - (frnAnfin-! + (j) n B n (j) n . (10) 

The fields with bars are rows, the ones without are columns and Ai, Ai, B>i are commuting 
matrices. One can visualize this action as a chain, extending into a new dimension. The 
degrees of freedom we are interested in sit at one end of the chain; these are the tfj, ip fields. 
The <p, <p fields are the extra fields I introduced to handle the condition number problem. 
The idea is to arrange matters so that integrating out all the <p, <p variables will produce 
an action for the variables ijj, iji of the precise rational form we wish. But, the condition 
number that will be relevant numerically, will be the condition number of the bigger kernel 
in Sq, involving all fermionic fields. 

* R. Edwards, private communication. 
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To get the induced action for the fields V>, ip we integrate out the fields 0, <fi starting 
from the other end of the chain. The integration over the pair of fermions at the end of 
the chain produces a factor of (detS n ) in front and adds a piece to the quadratic term 
£? n _i, coupling 4> n -i to n _i, of the form A n A n /B n . Now this can be iterated until the 
last pair of 0, cj> is integrated out. We have obtained the following identity: 

/n 
. ..d$ n d<f> n e s ° = l[(detB t )e^ +R ^, (11) 

i=l 

where, 

R= ^ (12) 



B 2 + 



3^-3 



B 3 + ... 



B n -i + 



A n A n 



B 



n 



Now, the expression for R is recognized as a truncated continued fraction. Any ratio 
of polynomials can be written as a truncated continued fraction by the Euclid algorithm 
(invert, divide with remainder in the denominator and continue). Thus, we learn that any 
fractional approximation we wish to use for the sign function can be mapped into a chain 
with only nearest neighbor interactions. 

Usually, P in (3) is odd and Q is even: P(x) = xPi(x 2 ), Q(x) = Qi(x 2 ). Pi is of 
degree n — 1 and Qi is of degree n. Therefore, the truncated continued fraction has the 
following structure: 

Pi (u) a 



u + Pi H 



(13) 



«2 

u + (3 2 + 



u + /3 3 + ... 



a an ~ 1 

u + ^ + lT+K 

Picking Bi = H^y + Pi, i = 1, 2, . . . , n and AiAi = a§Hw, AiAi = i = 2, 3 ... ,n 

produces the desired expression. Again, one needs pseudofermions to compensate for the 
prefactor f] det(H 2 v + fii) in (11). For the kernels of the pseudofermions to be positive 
definite we need fa > 0. 

A similar trick can be used if one wants to implement q 1 ^) = Z]"=i ■ Now 



The structure is somewhat different 



#0 = + (Xii^Hw^i + ^iip) -J2i 4>i(Hw + Pi)4>i- Again, one needs pseudofermions. 
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One can avoid having the squares of Hw in the chain by more continued fraction 
expansion. As an example, I worked out the explicit map of the approximation e n (x) 
of equation (4) to a chain involving only Hw terms (which are the least expensive to 
implement). To start, I use a formula that goes as far back as Euler: 



e n (x) = 



2nx 



1 + 



(An 2 - l)x' 



(14) 



3 + 



(An 2 - A)x' 



5 + 



[An 2 - (2n - l ) 2 ]x 2 
An-1 



4n-3 + 



Now, I use invariance under inversion of x to move the x factors around. I also change 
some signs to make the expression more symmetrical. The end result for the path integral 
is: 

r dfadfa . . . d0 n d(j) n e s * = (det H w ) 2n e^ l5+£n{Hw))lp , (15) 



To write down the quadratic action S* we introduce the extended fermionic fields x, \\ 

( V \ 



fan), X 



(16) 



\4>2n / 



5* = xH% where the new kernel, H, in block form, has the following structure: 



H 







/ 75 V®o 

\fo[\ —Hw 



\ 






a 2 



H 



w 



v/«2n-l 









\/Ol2n-\ 

—Hw 



\ 



(17) 



The numerical coefficients a are given below: 



(2n-j)(2n + j) 

an = 2n, a, = -, i = 1, z, ... 

J (2j-1)(2j + 1)' 



(18) 



The main point is that the structure of the extended kernels is sufficiently close to five 
dimensional fermions that we can be quite sure that the condition numbers are similar to 
the ones encountered with domain wall fermions. 



9 



Note that we now have a hermitian kernel, H. This would be useful if we wanted 
to use Lanczos techniques to study the entire eigenvalue spectrum of H. Actually, to do 
this in an efficient way (i.e. applying the Cullum-Willoughby method) one needs full 64 
bit precision, to be able to distinguish so called spurious eigenvalues from true ones. One 
cannot get the needed accuracy if one uses a direct evaluation of the action of 75-D. 

One may wonder whether the law of conservation of difficulties is not violated: how 
can it be that we compute the same ipip propagator ( 7B jj. fl is given by the ip-ip block in 
H -1 ) both in the pole method and in the extra fields method, and use conjugate gradient 
algorithms (essentially) in both cases, but hope for big gains in one method relative to the 
other ? At its essence the new trick is algorithmical - basically, the space in which the 
conjugate gradient operates is enlarged when extra fermions are added, and thus, some of 
the difficulties one encounters in the smaller space are avoided. Very roughly, what the 
extra fermions do, is to provide ways around barriers one faces in the original space. This 
is somewhat similar to solving the problem of minimizing a function over a discrete set, 
by first making the set continuous. 

Clearly, n plays a role analogous to the size of the extra dimension, N, in domain wall 
simulations. Thus, these two truncations of the overlap may end up being similar not only 
conceptually, but also numerically. The derivation of our identities went through essentially 
the same steps as those employed in [3] only now in reverse order. When comparing to 
domain wall fermions it becomes apparent that now we can work with a hermitian kernel, 
that we have much more flexibility, and that it is probably possible to exploit any efficient, 
possibly computer architecture dependent, implementation of the action of the lattice 
Wilson-Dirac operator D\y- I have not separated the chiral components of ip,ip here, 
something that is quite natural in the domain wall viewpoint, [3]. Of course, if there is 
an efficiency reason, one could try to separate the chiral components in the more general 
framework presented here, too. 

As always, if any single trick is useful, one usually considers possible combinations. 
By this I mean to exploit both the direct approach based on a sum of pole terms and the 
indirect approach based on extra fields. The essence of the numerical problem is that we 
wish to contract the spectrum of H w to two points, ±1. The negative half of the spectrum 
of H\y is intended to map into —1 and the positive half to +1. Any map that reduces 
the ranges of the negative and positive halves of the spectrum is useful. It produces a 
new Hw that can be used as an argument for a new map, which now can work easier. In 
this way we try to combine the good properties of various maps. Moreover, all that H\y 
needs to be, an this is important, is a reasonable discretization of the continuum Dirac 
operators with a negative mass of order inverse lattice spacing. Actually, one can add as 
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many fermion fields with large positive masses as one wishes. Thus, if we used a few extra 
fields to produce an effective Hyy (say for the end of the chain) which had some reasonable 
spectral properties, we could, conceivably, by adding a parameter p (as discussed at the 
beginning of this paper) make the mass of one fermion negative and use this action as the 
argument of a rational approximation implemented by the pole method. A better behaved 
input would be much easier to handle. Or, we could reverse the procedures. For example 
assume you wish to use a very short chain, say consisting only of ijjij} and <j>4> with an action 
(employing previous notation) xH-X given by: 

Since the quantity (i/jy) 1 / 4 is less violently behaved around zero than e(Hw) a rational 
approximation (or maybe even just a polynomial approximation) might be quite manage- 
able. 

My main message in this paper is that in the context of dynamical fermion simulations 
there are many alternatives and tricks that have not been yet explored, and it might be 
a waste to exclusively focus on the most literal numerical implementations of the recent 
theoretical advances on the topic of chiral symmetry on the lattice. 
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